9 research outputs found

    Scene Parsing using Multiple Modalities

    Get PDF
    Scene parsing is the task of assigning a semantic class label to the elements of a scene. It has many applications in autonomous systems when we need to understand the visual data captured from our environment. Different sensing modalities, such as RGB cameras, multi-spectral cameras and Lidar sensors, can be beneficial when pursuing this goal. Scene analysis using multiple modalities aims at leveraging complementary information captured by multiple sensing modalities. When multiple modalities are used together, the strength of each modality can combat the weaknesses of other modalities. Therefore, working with multiple modalities enables us to use powerful tools for scene analysis. However, possible gains of using multiple modalities come with new challenges such as dealing with misalignments between different modalities. In this thesis, our aim is to take advantage of multiple modalities to improve outdoor scene parsing and address the associated challenges. We initially investigate the potential of multi-spectral imaging for outdoor scene analysis. Our approach is to combine the discriminative strength of the multi-spectral signature in each pixel and the corresponding nature of the surrounding texture. Many materials appearing similar if viewed by a common RGB camera, will show discriminating properties if viewed by a camera capturing a greater number of separated wavelengths. When using imagery data for scene parsing, a number of challenges stem from, e.g., color saturation, shadow and occlusion. To address such challenges, we focus on scene parsing using multiple modalities, panoramic RGB images and 3D Lidar data in particular, and propose a multi-view approach to select the best 2D view that describes each element in the 3D point cloud data. Keeping our focus on using multiple modalities, we then introduce a multi-modal graphical model to address the problems of scene parsing using 2D3D data exhibiting extensive many-to-one correspondences. Existing methods often impose a hard correspondence between the 2D and 3D data, where the 2D and 3D corresponding regions are forced to receive identical labels. This results in performance degradation due to misalignments, 3D-2D projection errors and occlusions. We address this issue by defining a graph over the entire set of data that models soft correspondences between the two modalities. This graph encourages each region in a modality to leverage the information from its corresponding regions in the other modality to better estimate its class label. Finally, we introduce latent nodes to explicitly model inconsistencies between the modalities. The latent nodes allow us not only to leverage information from various domains in order to improve the labeling of the modalities, but also to cut the edges between inconsistent regions. To eliminate the need for hand tuning the parameters of our model, we propose to learn potential functions from training data. In addition, to demonstrate the benefits of the proposed approaches on publicly available multi-modality datasets, we introduce a new multi-modal dataset of panoramic images and 3D point cloud data captured from outdoor scenes (NICTA/2D3D Dataset)

    Sample and Filter: Nonparametric Scene Parsing via Efficient Filtering

    Get PDF
    Scene parsing has attracted a lot of attention in computer vision. While parametric models have proven effective for this task, they cannot easily incorporate new training data. By contrast, nonparametric approaches, which bypass any learning phase and directly transfer the labels from the training data to the query images, can readily exploit new labeled samples as they become available. Unfortunately, because of the computational cost of their label transfer procedures, state-of-the-art nonparametric methods typically filter out most training images to only keep a few relevant ones to label the query. As such, these methods throw away many images that still contain valuable information and generally obtain an unbalanced set of labeled samples. In this paper, we introduce a nonparametric approach to scene parsing that follows a sample-and-filter strategy. More specifically, we propose to sample labeled superpixels according to an image similarity score, which allows us to obtain a balanced set of samples. We then formulate label transfer as an efficient filtering procedure, which lets us exploit more labeled samples than existing techniques. Our experiments evidence the benefits of our approach over state-of-the-art nonparametric methods on two benchmark datasets.Comment: Please refer to the CVPR-2016 version of this manuscrip

    Classification of materials in natural scenes using multi-spectral images

    No full text
    In this paper, a method suitable for distinguishing between different materials occurring in natural scenes using a multi-spectral camera is devised. Such a capability is useful in autonomous robot applications to help negotiating the environment as wel

    Classification of natural scene multi spectral images using a new enhanced CRF

    No full text
    In this paper, a new enhanced CRF for discriminating between different materials in natural scenes using terrestrial multi spectral imaging is established. Most of the existing formulations of the CRF often suffer from over smoothing and loss of small de

    Multi-view terrain classification using panoramic imagery and LIDAR

    No full text
    The focus of this work is addressing the challenges of performing object recognition in real world scenes as captured by a commercial, state-of-the-art, surveying vehicle equipped with a 360° panoramic camera in conjunction with a 3D laser scanner (LIDA

    A Multi-modal Graphical Model for Scene Analysis

    No full text
    In this paper, we introduce a multi-modal graphical model to address the problems of semantic segmentation using 2D-3D data exhibiting extensive many-to-one correspondences. Existing methods often impose a hard correspondence between the 2D and 3D data, where the 2D and 3D corresponding regions are forced to receive identical labels. This results in performance degradation due to misalignments, 3D-2D projection errors and occlusions. We address this issue by defining a graph over the entire set of data that models soft correspondences between the two modalities. This graph encourages each region in a modality to leverage the information from its corresponding regions in the other modality to better estimate its class label. We evaluate our method on a publicly available dataset and beat the state-of-the-art. Additionally, to demonstrate the ability of our model to support multiple correspondences for objects in 3D and 2D domains, we introduce a new multi-modal dataset, which is composed of panoramic images and LIDAR data, and features a rich set of many-to-one correspondences

    Deep phenotyping: Deep learning for temporal phenotype/genotype classification

    Get PDF
    Background High resolution and high throughput genotype to phenotype studies in plants are underway to accelerate breeding of climate ready crops. In the recent years, deep learning techniques and in particular Convolutional Neural Networks (CNNs), Recurrent Neural Networks and Long-Short Term Memories (LSTMs), have shown great success in visual data recognition, classification, and sequence learning tasks. More recently, CNNs have been used for plant classification and phenotyping, using individual static images of the plants. On the other hand, dynamic behavior of the plants as well as their growth has been an important phenotype for plant biologists, and this motivated us to study the potential of LSTMs in encoding these temporal information for the accession classification task, which is useful in automation of plant production and care. Methods In this paper, we propose a CNN-LSTM framework for plant classification of various genotypes. Here, we exploit the power of deep CNNs for automatic joint feature and classifier learning, compared to using hand-crafted features. In addition, we leverage the potential of LSTMs to study the growth of the plants and their dynamic behaviors as important discriminative phenotypes for accession classification. Moreover, we collected a dataset of time-series image sequences of four accessions of Arabidopsis, captured in similar imaging conditions, which could be used as a standard benchmark by researchers in the field. We made this dataset publicly available. Conclusion The results provide evidence of the benefits of our accession classification approach over using traditional hand-crafted image analysis features and other accession classification frameworks. We also demonstrate that utilizing temporal information using LSTMs can further improve the performance of the system. The proposed framework can be used in other applications such as in plant classification given the environment conditions or in distinguishing diseased plants from healthy ones.We thank funding sources including, Australian Research Council (ARC) Centre of Excellence in Plant Energy Biology CE140100008, ARC Linkage Grant LP140100572 and the National Collaborative Research Infrastructure Scheme - Australian Plant Phenomics Facility
    corecore